incomplete data
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
- Asia > China > Guangdong Province > Shenzhen (0.05)
- Asia > China > Hong Kong (0.04)
- (2 more...)
- Information Technology > Data Science > Data Mining (1.00)
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Boosting Spectral Clustering on Incomplete Data via Kernel Correction and Affinity Learning
Spectral clustering has gained popularity for clustering non-convex data due to its simplicity and effectiveness. It is essential to construct a similarity graph using a high-quality affinity measure that models the local neighborhood relations among the data samples. However, incomplete data can lead to inaccurate affinity measures, resulting in degraded clustering performance. To address these issues, we propose an imputation-free framework with two novel approaches to improve spectral clustering on incomplete data. Firstly, we introduce a new kernel correction method that enhances the quality of the kernel matrix estimated on incomplete data with a theoretical guarantee, benefiting classical spectral clustering on pre-defined kernels. Secondly, we develop a series of affinity learning methods that equip the self-expressive framework with $\ell_p$-norm to construct an intrinsic affinity matrix with an adaptive extension. Our methods outperform existing data imputation and distance calibration techniques on benchmark datasets, offering a promising solution to spectral clustering on incomplete data in various real-world applications.
GeoMAE: Masking Representation Learning for Spatio-Temporal Graph Forecasting with Missing Values
Ke, Songyu, Wu, Chenyu, Liang, Yuxuan, Qin, Huiling, Zhang, Junbo, Zheng, Yu
The ubiquity of missing data in urban intelligence systems, attributable to adverse environmental conditions and equipment failures, poses a significant challenge to the efficacy of downstream applications, notably in the realms of traffic forecasting and energy consumption prediction. Therefore, it is imperative to develop a robust spatio-temporal learning methodology capable of extracting meaningful insights from incomplete datasets. Despite the existence of methodologies for spatio-temporal graph forecasting in the presence of missing values, unresolved issues persist. Primarily, the majority of extant research is predicated on time-series analysis, thereby neglecting the dynamic spatial correlations inherent in sensor networks. Junbo Zhang is the corresponding author. This research was done when the first author was an intern at JD Intelligent Cities Research & JD iCity under the supervision of the fifth author. The model is comprised of three principal components: an input preprocessing module, an attention-based spatio-temporal forecasting network (STAFN), and an auxiliary learning task, which draws inspiration from Masking AutoEncoders to enhance the robustness of spatio-temporal representation learning. Empirical evaluations on real-world datasets demonstrate that GeoMAE significantly outperforms existing benchmarks, achieving up to 13.20% relative improvement over the best baseline models. Introduction Spatio-temporal representation learning has emerged as a pivotal research area, underpinning various intelligent applications in smart cities that play crucial roles across multiple domains. For instance, precise weather forecasting can significantly mitigate the detrimental impacts of natural disasters through early prevention; advanced traffic prediction systems help optimize traffic flow and substantially reduce congestion; environmental monitoring enables rapid identification of pollution hotspots within urban environments.
- Asia > China > Beijing > Beijing (0.05)
- Asia > China > Guangdong Province > Guangzhou (0.04)
- Asia > China > Fujian Province > Fuzhou (0.04)
- (3 more...)
High-Rank Matrix Completion and Clustering under Self-Expressive Models
We propose efficient algorithms for simultaneous clustering and completion of incomplete high-dimensional data that lie in a union of low-dimensional subspaces. We cast the problem as finding a completion of the data matrix so that each point can be reconstructed as a linear or affine combination of a few data points. Since the problem is NP-hard, we propose a lifting framework and reformulate the problem as a group-sparse recovery of each incomplete data point in a dictionary built using incomplete data, subject to rank-one constraints. To solve the problem efficiently, we propose a rank pursuit algorithm and a convex relaxation. The solution of our algorithms recover missing entries and provides a similarity matrix for clustering. Our algorithms can deal with both low-rank and high-rank matrices, does not suffer from initialization, does not need to know dimensions of subspaces and can work with a small number of data points. By extensive experiments on synthetic data and real problems of video motion segmentation and completion of motion capture data, we show that when the data matrix is low-rank, our algorithm performs on par with or better than low-rank matrix completion methods, while for high-rank data matrices, our method significantly outperforms existing algorithms.
- North America > United States > Massachusetts > Suffolk County > Boston (0.04)
- Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
- Asia > Middle East > Jordan (0.04)
- South America > Paraguay > Asunción > Asunción (0.04)
- North America > Canada > Quebec > Montreal (0.04)
- Europe > Poland > Lesser Poland Province > Kraków (0.04)
- Asia > Middle East > Jordan (0.04)
- Europe > Germany > Hesse > Darmstadt Region > Darmstadt (0.04)
- North America > Canada (0.04)
- North America > United States (0.14)
- Asia > China > Guangdong Province > Shenzhen (0.04)
- Asia > China > Hong Kong (0.04)
- Oceania > Australia > Victoria > Melbourne (0.04)
- South America > Paraguay > Asunción > Asunción (0.04)
- North America > Canada > Quebec > Montreal (0.04)
- Europe > Poland > Lesser Poland Province > Kraków (0.04)
- Asia > Middle East > Jordan (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
- Europe > Spain > Galicia > Madrid (0.04)
- Asia > Middle East > Jordan (0.04)
- Information Technology > Data Science > Data Quality (1.00)
- Information Technology > Data Science > Data Mining (0.93)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
- Information Technology > Artificial Intelligence > Representation & Reasoning (0.68)